Modify ori file reader to deal with absence of EN (or not) at the end of the file#516
Modify ori file reader to deal with absence of EN (or not) at the end of the file#516GallegoSav wants to merge 3 commits intocositools:developfrom
Conversation
Updated data processing to drop NaN values from the dataframe.
Codecov Report✅ All modified and coverable lines are covered by tests.
🚀 New features to boost your workflow:
|
|
If the issue is that one particular ori file is malformed because the expected EN got dropped, why are we not fixing the file in question, instead of complicating the parser in a way that invites future bugs? |
|
Further comments: Of the three .ori files used in the tutorials,
The test case .ori file I recently extracted from DC3_final_...1sbins... follows its format and so also lacks any trailing marker. The other three test case .ori files all have an EN marker. So before trying to fix the code, can we clarify what we consider the correct .ori format? Can MEGAlib itself ever produce an .ori file without the EN marker? If not, I would argue that we should fix the two files we are using without such markers and then have cosipy throw an error if the EN is missing. We could even expend a bit of extra time to check for the EN explicitly, rather than relying on an indirect test with the CSV parser, since .ori is no longer our primary input format now that we have FITS. In any case, I clearly need to regenerate the FITS file for _1sbins and _15sbins so they have the last line, and also to fix that one test case .ori file. |
|
Another comment on how we parse .ori files: the basic parsing call is df = pd.read_csv(file, sep=r"\s+", skiprows=1, usecols=tuple(range(1,10)), header = None, comment = '#') Do we actually want to support commenting out lines with '#'? Is this a feature of MEGAlib's ori files? If not, I would argue that it is more likely to cause unexpected behavior than to be useful. |
I know I was the one who suggested the workaround that @GallegoSav implemented in this PR (thanks @GallegoSav btw). I was trying to save some time, but @jdbuhler, if you are willing to fix the files (it seem you already did #517, correct?) and add a check for
I don't think MEGALib comments out lines with |
|
Yes, I fixed the two offending tutorial .ori files (which are now in the develop tree on wasabi). Are we good with replacing the existing files in DC3 with these? I can work on the explicit EN check outside the framework of CSV parsing -- I'll just seek to the end of the file and look there. |
My preference is to leave those as is, and only fix the develop and DC4 folders. For DC3 I try to do hot fixes only when absolutely necessary. In this case the error from missing the last line is pretty small, and it doesn't prevent the code from running. Otherwise we would have to change the checksum of the release as well, not just develop. I'm also tagging @ckarwin to see what he thinks.
Sounds good, thank you. |
|
@israelmcmc and @GallegoSav , please see PR #533. |
This pr is for solving #503 .
If there is EN at the end of the ori file,
dropnawill remove it. If not , nothing will change